Byzantine Replication for Trustworthy Systems
نویسندگان
چکیده
A trustworthy networked information system (NIS) must continue to operate correctly even in the presence of environmental disruptions, human errors, and hostile attacks [10]—in other words, it must be both faulttolerant and secure. Reasoning about such a system is, unfortunately, very hard. Our limited success to date in building reliable NISs is an indication of the complexity involved with guaranteeing “just” fault-tolerance; once security is added, complexity runs the risk of becoming unmanageable. To compound the challenge, faulttolerance and security do not seem to integrate nicely. On the one hand, they often duplicate efforts, since both are concerned with maintaining data integrity and availability. On the other hand, they can be at odds with each other—for instance, the very replication that improves data integrity and availability against faults, harms confidentiality. An attractive approach toward managing this complexity is to model a component whose security has been compromised as faulty according to the Byzantine failure model. A system can then be hardened to guarantee its correct operation even if a subset of its components, up to a given threshold, suffer Byzantine faults. This approach can potentially yield significant advantages. First, it holds the promise of simplifying reasoning about trustworthy systems by making security a byproduct of fault tolerance. Second, it opens the possibility of leveraging for security purposes the large body of existing research in Byzantine fault tolerance (BFT). Third, it suggests that it may be possible to assemble untrustworthy components into a trustworthy system, just as traditional fault-tolerance protocols assemble unreliable components to build reliable systems. This is especially attractive given the economics of today’s marketplace, where few components are rigorously tested or verified. At the same time, this approach raises several concerns. First, Byzantine fault tolerance is notoriously expensive. It is not just that Byzantine protocols, to operate correctly, typically require participants to engage in numerous rounds of communication; much more importantly, they require a very high degree of replication, The cost associated with this replication may be prohibitively high, especially when considering that nversion programming (or opportunistic n-version programming) may be required to reduce the possibility of a large number correlated Byzantine faults caused by a single security exploit. Second, traditional Byzantine fault-tolerance techniques do not address confidentiality, which is a crucial aspect of security. Indeed, the replication required by most existing Byzantine fault-tolerance techniques can actually hurt confidentiality. Third, the very soundness of modeling nodes compromised as a result of a security attack as if they were Byzantine faults is problematic. The ability to choose an accurate value for f , the maximum number of faults that can occur at any time, is critical to the safety of any BFT protocol. When Byzantine fault tolerance is not used against security attacks, it is reasonable to treat Byzantine failures as independent and compute f accordingly. But it is not obvious how to compute f so the system is safe when an attacker finds a new vulnerability in an operating system—when that happens, the probability that all replicas running the same OS will shortly be compromised increases sharply. It is not clear whether the diversity introduced through n-version programming is sufficient to reduce significantly the probability of this type of correlated faults. Our research agenda over the last three years has been to determine whether or not hardening distributed systems against Byzantine faults is a practical and sound approach towards increasing trustworthiness. We have focused on two architectural primitives, state-machines and quorum systems, with encouraging results [1, 3, 2, 8, 5, 6, 11]. We are currently moving our agenda forward along three directions.
منابع مشابه
A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm
We have developed a practical algorithm for state-machine replication [7, 11] that tolerates Byzantine faults. The algorithm is described in [4]. It offers a strong safety property — it implements a linearizable [5] object such that all operations invoked on the object execute atomically despite Byzantine failures and concurrency. Unlike previous algorithms [11, 10, 6], ours works correctly in ...
متن کاملASTRO: Autonomous and Trustworthy Data Sharing
We present ASTRO, a Byzantine fault tolerant data sharing service for mobile computing environments. ASTRO is the first system to support disconnected operation and opportunistic data sharing among potentially Byzantine nodes while continuing to provide precise and useful consistency guarantees to correct nodes. Specifically, ASTRO supports fork-causal consistency, a new consistency semantics t...
متن کاملActive Quorum Systems
This paper outlines a flexible suite of object replication protocols that brings together Byzantine quorum systems registers and state machine replication. These protocols enable the implementation of Byzantine fault-tolerant applications that make minimal assumptions about the environment and that run in at most two more communication steps in almost all cases of non-favorable executions (in c...
متن کاملAbstractions for Devising Byzantine-Resilient State Machine Replication
State machine replication is a common approach for making a distributed service highly available and resilient to failures, by replicating it on different processes. It is well-known, however, that the difficulty of ensuring the safety and liveness of a replicated service increases significantly when no synchrony assumptions are made, and when processes can exhibit Byzantine behaviors. The cont...
متن کاملByzantine Fault Tolerance Can Be Fast
Byzantine fault tolerance is important because it can be used to implement highly-available systems that tolerate arbitrary behaviorfrom faulty components. This paper presents a detailed performance evaluation of BFT, a state-machine replication algorithm that tolerates Byzantine faults in asynchronous systems. Our results contradict the common belief that Byzantine fault tolerance is too slow ...
متن کامل